Skip to content

optimizations and formatting frontend and backend#114

Merged
ParagGhatage merged 2 commits intomainfrom
optimizations
Aug 16, 2025
Merged

optimizations and formatting frontend and backend#114
ParagGhatage merged 2 commits intomainfrom
optimizations

Conversation

@ParagGhatage
Copy link
Collaborator

@ParagGhatage ParagGhatage commented Aug 11, 2025

Tasks done:

  • opimized frontend and backend.
  • Deleted unused files
  • formatted code with Prettier and Ruff.

Summary by CodeRabbit

  • New Features

    • Analyze and chat requests now use a runtime-configurable backend base URL for easier environment setup.
  • Bug Fixes / Reliability

    • Improved content extraction with an additional fallback method to increase success for article scraping.
  • Documentation

    • Added setup instruction for NEXT_PUBLIC_API_URL and corrected backend env snippet.
  • Refactor

    • Widespread formatting and import standardization across frontend and backend; no functional changes.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 11, 2025

Walkthrough

Frontend analyze pages now use a runtime-configurable backend base URL (NEXT_PUBLIC_API_URL) for /api/process, /api/bias, and /api/chat calls and add a console.log in chat handling. Backend changes are mostly formatting, add a bs4 extraction fallback in the scraper, and remove the backend/start.sh script.

Changes

Cohort / File(s) Summary of changes
Frontend analyze pages
frontend/app/analyze/loading/page.tsx, frontend/app/analyze/results/page.tsx
Add backend_url = process.env.NEXT_PUBLIC_API_URL; replace hard-coded endpoints with `${backend_url}/api/{process
Frontend landing / formatting-only
frontend/app/page.tsx
Reformat imports, literals, and JSX layout; wrap long strings and comments. No behavioral changes.
Docs: README updates
README.md
Update env var instruction to set NEXT_PUBLIC_API_URL=http://localhost:8000 and adjust backend env-file path/formatting.
Scraper: added BS4 fallback
backend/app/modules/scraper/extractor.py
Add extract_with_bs4 as a third extraction method (readability + BeautifulSoup) as a fallback after trafilatura and newspaper; returns title/text when available.
Backend: vector/embed/chunk adjustments
backend/app/modules/vector_store/chunk_rag_data.py, backend/app/modules/vector_store/embed.py, backend/app/utils/store_vectors.py
Reformat chunk and vector construction (dict/list formatting), minor f-string fixes for error/log messages; no semantic API changes.
Backend: langgraph & nodes formatting
backend/app/modules/langgraph_builder.py, backend/app/modules/langgraph_nodes/*
Formatting and refactor of node registrations, joins, and return dicts (line-wrapping/trailing commas); no behavior changes.
Backend: chat/fact-check/search formatting
backend/app/modules/chat/*, backend/app/modules/facts_check/web_search.py, backend/app/modules/bias_detection/check_bias.py
Minor prompt string reformatting, single-line argument formatting, and small whitespace changes; semantics unchanged.
Backend: other formatting-only edits
backend/app/db/vector_store.py, backend/app/modules/* (various), backend/main.py, backend/app/routes/routes.py, backend/app/utils/*
Wide set of formatting-only edits (line breaks, trailing commas, collapsed multi-line literals) with no API or control-flow changes.
Removed startup script
backend/start.sh
File removed entirely; previous startup logic (uv install/run) deleted.

Sequence Diagram(s)

sequenceDiagram
  actor User
  participant FE_Load as Analyze Loading Page (FE)
  participant BE as Backend

  User->>FE_Load: Open analyze loading
  par
    FE_Load->>BE: POST ${backend_url}/api/process { url: storedUrl }
    BE-->>FE_Load: process result
  and
    FE_Load->>BE: POST ${backend_url}/api/bias { url: storedUrl }
    BE-->>FE_Load: bias result
  end
  FE_Load-->>User: Navigate to results
Loading
sequenceDiagram
  actor User
  participant FE_Results as Analyze Results Page (FE)
  participant BE as Backend

  User->>FE_Results: Send chat message
  FE_Results->>BE: POST ${backend_url}/api/chat { message }
  BE-->>FE_Results: chat response
  FE_Results->>FE_Results: console.log(data)
  FE_Results-->>User: Render response
Loading
sequenceDiagram
  participant Scraper as extractor
  participant trafilatura
  participant newspaper
  participant bs4 as bs4/readability

  Scraper->>trafilatura: try extract
  trafilatura-->>Scraper: text? (if yes -> return)
  Scraper->>newspaper: try extract
  newspaper-->>Scraper: text? (if yes -> return)
  Scraper->>bs4: try extract_with_bs4 (readability + BeautifulSoup)
  bs4-->>Scraper: text? (if yes -> return)
  Scraper-->>Scraper: return failure shape (if all fail)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

I nibble code like clover leaves,
Hop to envs the burrow weaves.
Process, bias, chat take flight,
One base URL guides the night.
New fallback digs — BS4 delight. 🐇✨

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch optimizations

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🔭 Outside diff range comments (1)
frontend/app/analyze/loading/page.tsx (1)

104-132: Fix effect cleanup: intervals/timeouts are not cleared on unmount

The cleanup function returned inside runAnalysis is never registered with useEffect, so progressInterval leaks; redirect timeout also isn’t cleared. Capture timer IDs in the effect scope and clear them in the useEffect return.

Minimal changes within this block:

-        const stepInterval = setInterval(() => {
+        stepInterval = setInterval(() => {
           setCurrentStep((prev) => {
             if (prev < steps.length - 1) {
               return prev + 1;
             } else {
               clearInterval(stepInterval);
-              setTimeout(() => {
-                router.push("/analyze/results");
-              }, 2000);
+              redirectTimeout = setTimeout(() => {
+                router.push("/analyze/results");
+              }, 2000);
               return prev;
             }
           });
         }, 2000);

-        const progressInterval = setInterval(() => {
+        progressInterval = setInterval(() => {
           setProgress((prev) => {
             if (prev < 100) {
               return prev + 1;
             }
             return prev;
           });
         }, 100);
-
-        return () => {
-          clearInterval(stepInterval);
-          clearInterval(progressInterval);
-        };

Additions needed outside this range:

// Declare in the outer scope of the useEffect
let stepInterval: ReturnType<typeof setInterval> | null = null;
let progressInterval: ReturnType<typeof setInterval> | null = null;
let redirectTimeout: ReturnType<typeof setTimeout> | null = null;

// And return a proper cleanup from the useEffect:
return () => {
  if (stepInterval) clearInterval(stepInterval);
  if (progressInterval) clearInterval(progressInterval);
  if (redirectTimeout) clearTimeout(redirectTimeout);
};
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ba87804 and 5deefb5.

📒 Files selected for processing (3)
  • frontend/app/analyze/loading/page.tsx (2 hunks)
  • frontend/app/analyze/results/page.tsx (4 hunks)
  • frontend/app/page.tsx (11 hunks)
🔇 Additional comments (6)
frontend/app/analyze/loading/page.tsx (1)

75-80: Ensure backend CORS is properly configured for cross-origin requests

Switching to a separate backend origin (process.env.NEXT_PUBLIC_API_URL) means your server must explicitly allow requests from your frontend. Verify that both /api/process and /api/bias endpoints return the appropriate CORS headers:

  • Access-Control-Allow-Origin set to your frontend’s origin (or * if that aligns with your security model)
  • Access-Control-Allow-Headers including at least Content-Type (and any custom/auth headers you use)
  • Access-Control-Allow-Credentials: true if you plan to send cookies or authentication tokens

I wasn’t able to confirm a .env.example entry for NEXT_PUBLIC_API_URL. Please document this variable in your .env.example so that team members know to configure the correct backend URL.

frontend/app/analyze/results/page.tsx (4)

7-7: Import consolidation LGTM

Consolidated lucide-react import is clean and consistent.


49-51: Minor: isLoading gate set LGTM

Setting isLoading to false only when both artifacts exist is fine; the later effect rehydrates and guards redirect.


67-75: Rehydration and redirect flow LGTM

The guard ensures consistent state and redirects if session data is missing.


112-114: Return block change LGTM

Formatting-only change around the return; no functional impact.

frontend/app/page.tsx (1)

1-301: Formatting-only changes LGTM

No functional differences introduced. The reflowed imports/strings and minor JSX formatting are consistent with Prettier/Ruff output.

Comment on lines +19 to +20
const backend_url = process.env.NEXT_PUBLIC_API_URL;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Normalize base URL and add safe fallback to same-origin

Using process.env on the client is inlined at build time. If NEXT_PUBLIC_API_URL isn’t set, the current code builds "undefined/api/..." and issues a bad relative request. Also, trailing slashes can produce double "///". Normalize and provide a safe fallback.

Apply:

-const backend_url = process.env.NEXT_PUBLIC_API_URL;
+// Normalize base URL (strip trailing slashes). Fallback to same-origin in browser.
+const backendUrl = (process.env.NEXT_PUBLIC_API_URL ?? "").replace(/\/+$/, "");
+const apiUrl = (path: string) =>
+  new URL(path, backendUrl || window.location.origin).toString();

Optionally, centralize apiUrl in a shared util (e.g., src/lib/api.ts) to DRY across pages.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const backend_url = process.env.NEXT_PUBLIC_API_URL;
// Normalize base URL (strip trailing slashes). Fallback to same-origin in browser.
const backendUrl = (process.env.NEXT_PUBLIC_API_URL ?? "").replace(/\/+$/, "");
const apiUrl = (path: string) =>
new URL(path, backendUrl || window.location.origin).toString();
🤖 Prompt for AI Agents
In frontend/app/analyze/loading/page.tsx around lines 19 to 20, the build-time
inlined NEXT_PUBLIC_API_URL can be undefined and may include trailing slashes,
causing requests like "undefined/api/..." or double slashes; change to compute a
normalized base URL with a safe same-origin fallback: if NEXT_PUBLIC_API_URL is
falsy use an empty string (or '/') for same-origin, trim any trailing slashes
from the env value, and ensure you always join paths safely (e.g., prepend a
single '/' when constructing endpoint paths or use the URL constructor).
Optionally move this logic into a shared util (src/lib/api.ts) and export a
normalized apiUrl to reuse across pages.

Comment on lines +75 to +80
axios.post(`${backend_url}/api/process`, {
url: storedUrl,
}),
axios.post(`${backend_url}/api/bias`, {
url: storedUrl,
}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use the normalized apiUrl helper to avoid double slashes and bad "undefined" paths

This ensures correctness regardless of env presence and trailing slashes.

-            axios.post(`${backend_url}/api/process`, {
-              url: storedUrl,
-            }),
-            axios.post(`${backend_url}/api/bias`, {
-              url: storedUrl,
-            }),
+            axios.post(apiUrl("/api/process"), { url: storedUrl }),
+            axios.post(apiUrl("/api/bias"),    { url: storedUrl }),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
axios.post(`${backend_url}/api/process`, {
url: storedUrl,
}),
axios.post(`${backend_url}/api/bias`, {
url: storedUrl,
}),
axios.post(apiUrl("/api/process"), { url: storedUrl }),
axios.post(apiUrl("/api/bias"), { url: storedUrl }),
🤖 Prompt for AI Agents
In frontend/app/analyze/loading/page.tsx around lines 75 to 80, the code builds
endpoints with string concatenation `${backend_url}/api/...` which can produce
double slashes or "undefined" paths; replace those concatenations with the
normalized apiUrl helper (import it if not already) and call
apiUrl('/api/process') and apiUrl('/api/bias') (or equivalent) when creating the
axios.post targets so environment values and trailing slashes are handled
consistently.

import BiasMeter from "@/components/bias-meter";
import axios from "axios";

const backend_url = process.env.NEXT_PUBLIC_API_URL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Normalize base URL and add apiUrl helper (same as loading page)

Prevent "undefined/api/..." and double slashes; provide same-origin fallback.

-const backend_url = process.env.NEXT_PUBLIC_API_URL;
+const backendUrl = (process.env.NEXT_PUBLIC_API_URL ?? "").replace(/\/+$/, "");
+const apiUrl = (path: string) =>
+  new URL(path, backendUrl || window.location.origin).toString();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const backend_url = process.env.NEXT_PUBLIC_API_URL;
const backendUrl = (process.env.NEXT_PUBLIC_API_URL ?? "").replace(/\/+$/, "");
const apiUrl = (path: string) =>
new URL(path, backendUrl || window.location.origin).toString();
🤖 Prompt for AI Agents
In frontend/app/analyze/results/page.tsx around line 22, the current const
backend_url = process.env.NEXT_PUBLIC_API_URL; can produce "undefined/api/..."
or double slashes and lacks a same-origin fallback; replace this with the same
normalization used in the loading page by creating an apiUrl helper that reads
NEXT_PUBLIC_API_URL, falls back to '' (same-origin) when unset, trims any
trailing slash, and returns a base string you can append endpoints to without
introducing duplicate slashes or "undefined"; update usages in this file to call
that helper instead of using backend_url directly.

Comment on lines +78 to +94
async function handleSendMessage(e: React.FormEvent) {
e.preventDefault();
if (!message.trim()) return;
const newMessages = [...messages, { role: "user", content: message }];
setMessages(newMessages);
setMessage("");

const res = await axios.post("http://Thunder1245-perspective-backend.hf.space/api/chat", {
message: message
const res = await axios.post(`${backend_url}/api/chat`, {
message: message,
});
const data = res.data;

console.log(data)
const data = res.data;

// 🔹 Step 2: Append LLM’s response
setMessages([...newMessages, { role: "assistant", content: data.answer }]);
};
console.log(data);

// 🔹 Step 2: Append LLM’s response
setMessages([...newMessages, { role: "assistant", content: data.answer }]);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add error handling and dev-only logging in chat handler; use apiUrl helper

Current code throws on network errors and logs verbosely in production. Guard with try/catch and normalize the endpoint.

-  async function handleSendMessage(e: React.FormEvent) {
+  async function handleSendMessage(e: React.FormEvent) {
     e.preventDefault();
     if (!message.trim()) return;
     const newMessages = [...messages, { role: "user", content: message }];
     setMessages(newMessages);
     setMessage("");
-
-    const res = await axios.post(`${backend_url}/api/chat`, {
-      message: message,
-    });
-    const data = res.data;
-
-    console.log(data);
-
-    // 🔹 Step 2: Append LLM’s response
-    setMessages([...newMessages, { role: "assistant", content: data.answer }]);
+    try {
+      const { data } = await axios.post(apiUrl("/api/chat"), { message });
+      if (process.env.NODE_ENV !== "production") {
+        console.debug("chat response:", data);
+      }
+      setMessages([
+        ...newMessages,
+        { role: "assistant", content: data.answer },
+      ]);
+    } catch (error) {
+      console.error("Failed to send chat message:", error);
+      setMessages([
+        ...newMessages,
+        {
+          role: "assistant",
+          content: "Sorry, I couldn’t reach the server. Please try again.",
+        },
+      ]);
+    }
   }

Optional: add an isSending flag to disable the form while awaiting the response.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async function handleSendMessage(e: React.FormEvent) {
e.preventDefault();
if (!message.trim()) return;
const newMessages = [...messages, { role: "user", content: message }];
setMessages(newMessages);
setMessage("");
const res = await axios.post("http://Thunder1245-perspective-backend.hf.space/api/chat", {
message: message
const res = await axios.post(`${backend_url}/api/chat`, {
message: message,
});
const data = res.data;
console.log(data)
const data = res.data;
// 🔹 Step 2: Append LLM’s response
setMessages([...newMessages, { role: "assistant", content: data.answer }]);
};
console.log(data);
// 🔹 Step 2: Append LLM’s response
setMessages([...newMessages, { role: "assistant", content: data.answer }]);
}
async function handleSendMessage(e: React.FormEvent) {
e.preventDefault();
if (!message.trim()) return;
const newMessages = [...messages, { role: "user", content: message }];
setMessages(newMessages);
setMessage("");
try {
const { data } = await axios.post(apiUrl("/api/chat"), { message });
if (process.env.NODE_ENV !== "production") {
console.debug("chat response:", data);
}
setMessages([
...newMessages,
{ role: "assistant", content: data.answer },
]);
} catch (error) {
console.error("Failed to send chat message:", error);
setMessages([
...newMessages,
{
role: "assistant",
content: "Sorry, I couldn’t reach the server. Please try again.",
},
]);
}
}
🤖 Prompt for AI Agents
frontend/app/analyze/results/page.tsx around lines 78 to 94: the chat submit
handler currently calls the backend directly, logs unconditionally, and can
throw on network errors; wrap the request in a try/catch, use the apiUrl helper
to build the endpoint (e.g. apiUrl('/api/chat')) instead of concatenating
backend_url, and add an isSending state flag set true before the request and
false in finally to disable the form while awaiting the response; in the catch
block gracefully handle errors (set isSending false, optionally set an error
state or show a toast) and only perform console.log/debug logging when
process.env.NODE_ENV === 'development'.

Comment on lines +85 to +94
const res = await axios.post(`${backend_url}/api/chat`, {
message: message,
});
const data = res.data;

console.log(data)
const data = res.data;

// 🔹 Step 2: Append LLM’s response
setMessages([...newMessages, { role: "assistant", content: data.answer }]);
};
console.log(data);

// 🔹 Step 2: Append LLM’s response
setMessages([...newMessages, { role: "assistant", content: data.answer }]);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Ensure backend CORS covers chat endpoint

Switching to a separate origin requires proper CORS on /api/chat similar to other endpoints.


🏁 Script executed:

#!/bin/bash
# Sanity-check we migrated all frontend chat/process/bias endpoints to use NEXT_PUBLIC_API_URL
rg -n --hidden --glob '!.git' '(axios\.post|fetch)\(.*?/api/(chat|process|bias)' -A1 -B2
# Show any remaining hard-coded http(s):// endpoints
rg -n --hidden --glob '!.git' 'https?://.*/api/(chat|process|bias)' -A0 -B0

Length of output: 986


Ensure backend CORS covers the chat endpoint

All frontend calls to /api/chat, /api/process, and /api/bias now use the backend_url environment variable, so they run against a separate origin. You must update your backend’s CORS configuration to allow requests from your Next.js app’s origin.

Please verify the CORS settings for the following locations:

  • frontend/app/analyze/results/page.tsx (around lines 85–94)
  • frontend/app/analyze/loading/page.tsx (around lines 73–79)

Make sure the /api/chat endpoint’s CORS policy mirrors the configuration used for /api/process and /api/bias.

🤖 Prompt for AI Agents
In frontend/app/analyze/results/page.tsx around lines 85 to 94 the frontend now
calls the backend via backend_url for /api/chat so the backend must permit
cross-origin requests from the Next.js app origin; update the backend CORS
configuration for the /api/chat route to include the same allowed origin(s),
methods, and headers used for /api/process and /api/bias (or reuse that
middleware/config), ensure credentials setting and preflight (OPTIONS) handling
match, and redeploy so requests from the frontend origin succeed.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🔭 Outside diff range comments (2)
backend/app/modules/scraper/extractor.py (2)

26-26: Critical: headers passed incorrectly to requests.get (being treated as query params).

Pass headers via the named argument; otherwise the custom User-Agent is never sent.

-            res = requests.get(self.url, self.headers, timeout=10)
+            res = requests.get(self.url, headers=self.headers, timeout=10)

24-31: SSRF hardening recommended for URL fetches.

If the URL originates from user input, add guards (scheme allowlist http/https, block private IPs/localhost, DNS pinning, redirect limits) to avoid SSRF.

🧹 Nitpick comments (8)
README.md (3)

137-145: Fix list indentation, fenced code language, stray asterisk, and .env spacing

  • MD007: remove extra list indentation.
  • MD040: specify a language for fenced block.
  • Typo: “Setup environment variables:*” should be italic like other sections.
  • .env: remove spaces around “=”.
  • Spacing: add space in “/frontend directory”.
-Setup environment variables:*
-  - add .env file in `/frontend`directory.
-  - add following environment variable in your .env file.
-```
-NEXT_PUBLIC_API_URL = http://localhost:8000
-
-```
+*Setup environment variables:*
+- add .env file in `/frontend` directory.
+- add the following environment variable in your `.env` file.
+```env
+NEXT_PUBLIC_API_URL=http://localhost:8000
+```

142-142: Clarify dev vs prod URL

Consider adding a brief note that http://localhost:8000 is for local development; use your deployed backend’s HTTPS URL in production to avoid mixed content/CORS issues.


163-170: Align backend env block with markdownlint and dotenv conventions

  • MD007: remove extra list indentation.
  • MD040: add language to the fenced block.
  • .env: remove spaces around “=” and fix spacing in “/backend directory”.
  • Remove stray indentation before closing fence.
-  - add .env file in `/backend`directory.
-  - add following environment variable in your .env file.
-  ```
-GROQ_API_KEY= <groq_api_key>
-PINECONE_API_KEY = <your_pinecone_API_KEY>
-PORT = 8000
-SEARCH_KEY = <your_Google_custom_search_engine_API_key>
-  ```
+- add .env file in `/backend` directory.
+- add the following environment variables in your `.env` file.
+```env
+GROQ_API_KEY=<groq_api_key>
+PINECONE_API_KEY=<your_pinecone_API_KEY>
+PORT=8000
+SEARCH_KEY=<your_Google_custom_search_engine_API_key>
+```

Optional: include a .env.example with these keys and ensure .env is gitignored.

backend/app/modules/scraper/extractor.py (3)

17-22: UA header OK; consider configurability and locale hints.

Make the UA configurable (env/config) and consider adding an Accept-Language to improve extraction on locale-sensitive sites.

         self.headers = {
-            "User-Agent": (
+            "User-Agent": (
                 "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
                 " AppleWebKit/537.36 "
                 "(KHTML, like Gecko) Chrome/113.0 Safari/537.36"
             )
-        }
+        ,
+            "Accept-Language": "en-US,en;q=0.9",
+        }

71-76: Prefer lxml parser when available; normalize text to reduce noise.

html.parser works, but lxml is faster and more robust if installed. Also consider trimming empty lines to reduce boilerplate.

-            soup = BeautifulSoup(doc.summary(), "html.parser")
-            title = doc.title()
-            text = soup.get_text(separator="\n")
+            try:
+                soup = BeautifulSoup(doc.summary(), "lxml")
+            except Exception:
+                soup = BeautifulSoup(doc.summary(), "html.parser")
+            title = doc.title()
+            raw = soup.get_text(separator="\n")
+            text = "\n".join(line.strip() for line in raw.splitlines() if line.strip())

92-92: Standardize the failure payload shape for easier client handling.

Consider returning consistent keys even on failure.

-        return {"url": self.url, "text": "", "error": "Failed to extract article."}
+        return {
+            "url": self.url,
+            "title": "",
+            "text": "",
+            "authors": [],
+            "publish_date": None,
+            "error": "Failed to extract article.",
+        }
backend/app/db/vector_store.py (1)

27-27: Formatting change is fine; consider making provider/region configurable

Single-line spec is OK. For multi-region/multi-cloud deployments, consider sourcing cloud/region from env (e.g., PINECONE_CLOUD, PINECONE_REGION) with sane defaults.

backend/app/modules/vector_store/chunk_rag_data.py (1)

48-53: Avoid recreating fact_fields on every loop; add a type check for fact

  • Defining fact_fields inside the loop recreates the list for every iteration; hoist it once above the loop.
  • Add a clear type check to avoid TypeErrors when fact isn’t a dict.

Define once above the loop:

fact_fields = ["original_claim", "verdict", "explanation", "source_link"]
for i, fact in enumerate(data["facts"]):
    if not isinstance(fact, dict):
        raise ValueError(f"Fact at index {i} must be a dict")
    for field in fact_fields:
        if field not in fact:
            raise ValueError(f"Missing required fact field: {field} in fact index {i}")

Within this hunk, remove the per-iteration assignment and add the type check:

-            fact_fields = ["original_claim", "verdict", "explanation", "source_link"]
-            for field in fact_fields:
+            if not isinstance(fact, dict):
+                raise ValueError(f"Fact at index {i} must be a dict")
+            for field in fact_fields:
                 if field not in fact:
                     raise ValueError(
                         f"Missing required fact field: {field} in fact index {i}"
                     )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5deefb5 and 959f690.

📒 Files selected for processing (23)
  • README.md (2 hunks)
  • backend/app/db/vector_store.py (2 hunks)
  • backend/app/modules/bias_detection/check_bias.py (2 hunks)
  • backend/app/modules/chat/embed_query.py (0 hunks)
  • backend/app/modules/chat/get_rag_data.py (1 hunks)
  • backend/app/modules/chat/llm_processing.py (2 hunks)
  • backend/app/modules/facts_check/web_search.py (1 hunks)
  • backend/app/modules/langgraph_builder.py (3 hunks)
  • backend/app/modules/langgraph_nodes/error_handler.py (1 hunks)
  • backend/app/modules/langgraph_nodes/fact_check.py (1 hunks)
  • backend/app/modules/langgraph_nodes/generate_perspective.py (2 hunks)
  • backend/app/modules/langgraph_nodes/sentiment.py (1 hunks)
  • backend/app/modules/langgraph_nodes/store_and_send.py (1 hunks)
  • backend/app/modules/scraper/cleaner.py (4 hunks)
  • backend/app/modules/scraper/extractor.py (5 hunks)
  • backend/app/modules/scraper/keywords.py (2 hunks)
  • backend/app/modules/vector_store/chunk_rag_data.py (1 hunks)
  • backend/app/modules/vector_store/embed.py (1 hunks)
  • backend/app/routes/routes.py (0 hunks)
  • backend/app/utils/fact_check_utils.py (1 hunks)
  • backend/app/utils/store_vectors.py (1 hunks)
  • backend/main.py (2 hunks)
  • backend/start.sh (0 hunks)
💤 Files with no reviewable changes (3)
  • backend/app/modules/chat/embed_query.py
  • backend/app/routes/routes.py
  • backend/start.sh
✅ Files skipped from review due to trivial changes (14)
  • backend/app/modules/langgraph_nodes/store_and_send.py
  • backend/app/utils/store_vectors.py
  • backend/app/modules/chat/llm_processing.py
  • backend/app/modules/scraper/keywords.py
  • backend/app/modules/chat/get_rag_data.py
  • backend/main.py
  • backend/app/modules/langgraph_nodes/fact_check.py
  • backend/app/modules/langgraph_nodes/sentiment.py
  • backend/app/modules/langgraph_nodes/error_handler.py
  • backend/app/modules/facts_check/web_search.py
  • backend/app/utils/fact_check_utils.py
  • backend/app/modules/scraper/cleaner.py
  • backend/app/modules/bias_detection/check_bias.py
  • backend/app/modules/langgraph_builder.py
🧰 Additional context used
🪛 Ruff (0.12.2)
backend/app/db/vector_store.py

36-36: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🪛 markdownlint-cli2 (0.17.2)
README.md

139-139: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


140-140: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


141-141: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


163-163: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


164-164: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


165-165: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (7)
backend/app/modules/scraper/extractor.py (3)

43-44: LGTM: explicit JSON output for trafilatura.

This makes the return type deterministic and simplifies downstream handling.


58-61: LGTM: safe ISO-8601 serialization for publish_date.

Handles None cleanly and preserves tz info when present.


85-86: No direct dict indexing of authors/publish_date found; safe to proceed

A repository-wide search for ['authors'], ["authors"], ['publish_date'], or ["publish_date"] returned no matches, indicating there are no strict dict key lookups that would raise a KeyError. Downstream consumers already guard against missing fields (e.g. using .get()), so the new bs4+Readability fallback can safely omit those keys without breaking. No further changes required.

backend/app/modules/vector_store/chunk_rag_data.py (1)

55-67: LGTM; chunk payload shape aligns with downstream embedder

Chunk has id/text/metadata as expected. No functional concerns.

backend/app/modules/vector_store/embed.py (1)

15-16: Minor formatting improvement is fine

Single f-string is clearer. No functional change.

backend/app/modules/langgraph_nodes/generate_perspective.py (2)

46-52: All prompt variable names correctly aligned with generation_prompt

The keys passed to chain.invokecleaned_article, facts, and sentiment—exactly match the placeholders defined in the template ({cleaned_article}, {facts}, {sentiment}), so there’s no risk of KeyErrors or empty slots.


60-60: Ignore the suggestion to return only the raw string here; downstream nodes require the full PerspectiveOutput object

  • judge_perspective immediately does
    perspective_obj = state.get("perspective")
    text = getattr(perspective_obj, "perspective", "").strip()
    and will break if perspective is a string.
  • chunk_rag_data.py similarly expects state["perspective"] to have .perspective and .reasoning attributes when building vector‐store entries.

The proper place to strip out reasoning is at the external API boundary—not in this internal pipeline node.

Likely an incorrect or invalid review comment.

except Exception as e:
raise RuntimeError("Error occured while "
f"connecting to the index {INDEX_NAME}:{e}")
raise RuntimeError(f"Error occured while connecting to the index {INDEX_NAME}:{e}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Use exception chaining and fix typos in error message (Ruff B904)

Adopt raise ... from e and correct spelling to improve debuggability and consistency.

Apply this diff:

-    raise RuntimeError(f"Error occured while connecting to the index {INDEX_NAME}:{e}")
+    raise RuntimeError(
+        f"Error occurred while connecting to the index {INDEX_NAME}: {e}"
+    ) from e

Also mirror the same improvements for the earlier client initialization exception (Line 13), e.g.:

except Exception as e:
    raise RuntimeError(
        f"Error occurred while initializing Pinecone client: {e}"
    ) from e

🏁 Script executed:

#!/bin/bash
# Display the beginning of the file to locate the client initialization exception around line 13
sed -n '1,50p' backend/app/db/vector_store.py

echo "----"

# Display lines around the reported error at line 36
sed -n '30,70p' backend/app/db/vector_store.py

Length of output: 1414


Use exception chaining and correct typos in error messages

Two locations in backend/app/db/vector_store.py need updates to improve debuggability and consistency:

  • Client initialization block (around line 13)
  • Index connection block (around line 36)

Apply the following diffs:

--- a/backend/app/db/vector_store.py
@@ -9,7 +9,10 @@ PINECONE_API_KEY = os.getenv("PINECONE_API_KEY")
 try:
     # Initialize Pinecone client
     pc = Pinecone(api_key=PINECONE_API_KEY)
-
-except Exception as e:
-    raise RuntimeError(f"Error occured while intialising pinecone client:{e}")
+except Exception as e:
+    raise RuntimeError(
+        f"Error occurred while initializing Pinecone client: {e}"
+    ) from e
--- a/backend/app/db/vector_store.py
@@ -34,7 +37,10 @@ if not pc.has_index(INDEX_NAME):
 try:
     # Connect to the index
     index = pc.Index(INDEX_NAME)
-
-except Exception as e:
-    raise RuntimeError(f"Error occured while connecting to the index {INDEX_NAME}:{e}")
+except Exception as e:
+    raise RuntimeError(
+        f"Error occurred while connecting to the index {INDEX_NAME}: {e}"
+    ) from e

These changes:

  • Correct “occured” → “occurred” and “intialising” → “initializing”
  • Adopt exception chaining (raise … from e) for original traceback preservation
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
raise RuntimeError(f"Error occured while connecting to the index {INDEX_NAME}:{e}")
PINECONE_API_KEY = os.getenv("PINECONE_API_KEY")
try:
# Initialize Pinecone client
pc = Pinecone(api_key=PINECONE_API_KEY)
except Exception as e:
raise RuntimeError(
f"Error occurred while initializing Pinecone client: {e}"
) from e
Suggested change
raise RuntimeError(f"Error occured while connecting to the index {INDEX_NAME}:{e}")
if not pc.has_index(INDEX_NAME):
try:
# Connect to the index
index = pc.Index(INDEX_NAME)
except Exception as e:
raise RuntimeError(
f"Error occurred while connecting to the index {INDEX_NAME}: {e}"
) from e
🧰 Tools
🪛 Ruff (0.12.2)

36-36: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🤖 Prompt for AI Agents
In backend/app/db/vector_store.py around lines 13 and 36, update the two
RuntimeError messages to correct typos and preserve original tracebacks: change
"intialising" to "initializing" in the client initialization block and "occured"
to "occurred" in the index connection block, and re-raise using exception
chaining (raise RuntimeError("...") from e) so the original exception is
attached to the new RuntimeError.

model=my_llm,
temperature=0.7
)
llm = ChatGroq(model=my_llm, temperature=0.7)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Lower temperature for structured output and verify model ID

For schema-constrained generation, a lower temperature materially reduces parse/validation errors. Also, please confirm the Groq model ID is valid.

Apply:

-llm = ChatGroq(model=my_llm, temperature=0.7)
+llm = ChatGroq(model=my_llm, temperature=0.2)

Optionally consider making my_llm configurable via env/config for deployments.

To verify the model ID:


🌐 Web query:

What are the currently supported Groq model IDs for ChatGroq? Is "llama-3.3-70b-versatile" valid?

💡 Result:

Short answer: Yes — llama-3.3-70b-versatile is a valid Groq model ID and is commonly used with ChatGroq. (See Groq's model page and Chat/Playground examples.) (console.groq.com)

Which models are currently available for ChatGroq (representative list, as shown in Groq’s docs and third‑party integrator pages):

Notes and recommendation:

  • Model availability and exact IDs can change. To get the authoritative, up‑to‑date list you can call Groq’s models endpoint (returns all active model IDs): GET https://api.groq.com/openai/v1/models (example curl shown in Groq docs). (groq-preview.mintlify.app)
  • When using LangChain’s ChatGroq, pass the Groq model ID as the model parameter (e.g., model="llama-3.3-70b-versatile"). (python.langchain.com)

If you’d like, I can (a) fetch the live model list right now from Groq’s API for you, or (b) show exact code examples for calling ChatGroq with llama-3.3-70b-versatile in your environment. Which would you prefer?

Citations:


Lower temperature for structured output (model ID confirmed valid)

The Groq model ID "llama-3.3-70b-versatile" is valid and widely used. To improve schema-constrained generation reliability, please lower the temperature:

File: backend/app/modules/langgraph_nodes/generate_perspective.py
Lines: 16

- llm = ChatGroq(model=my_llm, temperature=0.7)
+ llm = ChatGroq(model=my_llm, temperature=0.2)

Optional: Make my_llm configurable via environment variables or your configuration system for easier deployments.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
llm = ChatGroq(model=my_llm, temperature=0.7)
llm = ChatGroq(model=my_llm, temperature=0.2)
🤖 Prompt for AI Agents
In backend/app/modules/langgraph_nodes/generate_perspective.py around line 16,
the ChatGroq LLM is instantiated with temperature=0.7 which can produce
non-deterministic, free-form output; lower the temperature (e.g., 0.0–0.2) to
improve schema-constrained/structured outputs and update the call accordingly,
and optionally make the model id and temperature configurable via environment
variables or the existing config system so deployments can adjust model and
temperature without code changes.

Comment on lines +37 to +44
facts_str = "\n".join(
[
f"Claim: {f['original_claim']}\n"
"Verdict: {f['verdict']}\nExplanation: "
"{f['explanation']}"
for f in state["facts"]
]
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Critical: f-string interpolation bug in facts_str (‘verdict’ and ‘explanation’ not rendered)

Only the first segment is an f-string; the others are plain strings, so braces are emitted literally. This changes behavior.

Fix and also avoid an intermediate list:

-        facts_str = "\n".join(
-            [
-                f"Claim: {f['original_claim']}\n"
-                "Verdict: {f['verdict']}\nExplanation: "
-                "{f['explanation']}"
-                for f in state["facts"]
-            ]
-        )
+        facts_str = "\n".join(
+            (
+                f"Claim: {f['original_claim']}\n"
+                f"Verdict: {f['verdict']}\n"
+                f"Explanation: {f['explanation']}"
+            )
+            for f in facts
+        )

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In backend/app/modules/langgraph_nodes/generate_perspective.py around lines 37
to 44, the string building uses only the first segment as an f-string so
"{f['verdict']}" and "{f['explanation']}" are not interpolated; replace the
current list comprehension with a single generator expression that yields a
fully interpolated f-string for each fact and pass that generator directly to
"\n".join to avoid creating an intermediate list and ensure verdict and
explanation are rendered.

Comment on lines +34 to 45
chunks.append(
{
"id": f"{article_id}-perspective",
"text": perspective_obj.perspective,
"metadata": {
"type": "counter-perspective",
"reasoning": perspective_obj.reasoning,
"article_id": article_id,
},
}
})
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Robustify perspective handling; perspective_data is prepared but unused

You normalize perspective_data = data["perspective"] (Line 16-19) but then use perspective_obj = data["perspective"] and access attributes. If perspective is a dict, the hasattr check fails and attribute access on Line 37/40 will break. Normalize once and read via a unified interface.

Suggested approach outside this hunk (place before the append):

perspective_raw = data["perspective"]
if hasattr(perspective_raw, "dict"):
    perspective_raw = perspective_raw.dict()

# Unified access
if isinstance(perspective_raw, dict):
    perspective_text = perspective_raw.get("perspective")
    reasoning_text = perspective_raw.get("reasoning")
else:
    perspective_text = getattr(perspective_raw, "perspective", None)
    reasoning_text = getattr(perspective_raw, "reasoning", None)

if not perspective_text or not reasoning_text:
    raise ValueError("Perspective object missing required fields")

Then update this block as follows:

-                "text": perspective_obj.perspective,
+                "text": perspective_text,
                 "metadata": {
                     "type": "counter-perspective",
-                    "reasoning": perspective_obj.reasoning,
+                    "reasoning": reasoning_text,
                     "article_id": article_id,
                 },
🤖 Prompt for AI Agents
In backend/app/modules/vector_store/chunk_rag_data.py around lines 34 to 45, the
code uses perspective_obj attributes directly which will break when
data["perspective"] is a dict; normalize the raw perspective first (e.g., assign
perspective_raw = data["perspective"], convert via .dict() if present), then
extract unified perspective_text and reasoning_text by checking isinstance(dict)
and using .get(...) or getattr(...), validate both fields exist (raise
ValueError if missing), and finally use perspective_text and reasoning_text in
the chunks.append metadata instead of accessing perspective_obj.perspective /
.reasoning.

Comment on lines +23 to +25
vectors.append(
{"id": chunk["id"], "values": embedding, "metadata": chunk["metadata"]}
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Validate presence of 'id' and 'metadata' before use to avoid KeyError

You only validate 'text' earlier, but access 'id' and 'metadata' here. Add validation to the loop at Lines 11-16 to enforce all required keys.

Example adjustment outside this hunk:

# Validate chunk structure
for i, chunk in enumerate(chunks):
    if not isinstance(chunk, dict):
        raise ValueError(f"Invalid chunk at index {i}: expected dict")
    missing = [k for k in ("id", "text", "metadata") if k not in chunk]
    if missing:
        raise ValueError(f"Invalid chunk at index {i}: missing keys {missing}")
🤖 Prompt for AI Agents
In backend/app/modules/vector_store/embed.py around lines 11 to 16 (and
affecting lines 23-25), the loop only validates 'text' but later accesses 'id'
and 'metadata', risking KeyError; update the loop to first ensure each chunk is
a dict and then check for the presence of all required keys ("id", "text",
"metadata") for each chunk (reporting the chunk index and the missing keys),
raising a clear ValueError if any are missing so the later vectors.append(...)
can safely use chunk["id"] and chunk["metadata"].

@ParagGhatage ParagGhatage merged commit 0e715b0 into main Aug 16, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant